Finite-Memory Strategies in POMDPs with Long-Run Average Objectives

نویسندگان

چکیده

Partially observable Markov decision processes (POMDPs) are standard models for dynamic systems with probabilistic and nondeterministic behaviour in uncertain environments. We prove that POMDPs long-run average objective, the maker has approximately optimal strategies finite memory. This implies notably approximating value is recursively enumerable, as well a weak continuity property of respect to transition function.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Markov Decision Processes with Multiple Long-Run Average Objectives

We consider Markov decision processes (MDPs) with multiple long-run average objectives. Such MDPs occur in design problems where one wishes to simultaneously optimize several criteria, for example, latency and power. The possible trade-offs between the different objectives are characterized by the Pareto curve. We show that every Pareto optimal point can be ε-approximated by a memoryless strate...

متن کامل

Magnifying Lens Abstraction for Stochastic Games with Discounted and Long-run Average Objectives

Turn-based stochastic games and its important subclass Markov decision processes (MDPs) provide models for systems with both probabilistic and nondeterministic behaviors. We consider turnbased stochastic games with two classical quantitative objectives: discounted-sum and long-run average objectives. The game models and the quantitative objectives are widely used in probabilistic verification, ...

متن کامل

Strategy Synthesis for Stochastic Games with Multiple Long-Run Objectives

We consider turn-based stochastic games whose winning conditions are conjunctions of satisfaction objectives for long-run average rewards, and address the problem of finding a strategy that almost surely maintains the averages above a given multi-dimensional threshold vector. We show that strategies constructed from Pareto set approximations of expected energy objectives are ε-optimal for the c...

متن کامل

Sensor Synthesis for POMDPs with Reachability Objectives

Partially observable Markov decision processes (POMDPs) are widely used in probabilistic planning problems in which an agent interacts with an environment using noisy and imprecise sensors. We study a setting in which the sensors are only partially defined and the goal is to synthesize"weakest"additional sensors, such that in the resulting POMDP, there is a small-memory policy for the agent tha...

متن کامل

Reinforcement learning for long-run average cost

A large class of sequential decision-making problems under uncertainty can be modeled as Markov and Semi-Markov Decision Problems, when their underlying probability structure has a Markov chain. They may be solved by using classical dynamic programming methods. However, dynamic programming methods suffer from the curse of dimensionality and break down rapidly in face of large state spaces. In a...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Mathematics of Operations Research

سال: 2022

ISSN: ['0364-765X', '1526-5471']

DOI: https://doi.org/10.1287/moor.2020.1116